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1. INTRODUCTION 

The problem of mental health is a serious issue in modern society. It usually refers to a person’s 
mood, thoughts, and behavior [1]. One of the leading causes of suicide is poor mental health [2]. 
A personality disorder is one type of mental health disorder that interferes with ways of thinking, 
understanding situations, and relating to others. A person with a personality disorder can self-destruct in any 
situation [3]. There are several types of personality disorders, such as narcissistic, paranoid, schizoid, 
antisocial, borderline, and so on. A 2019 World Health Organization (WHO) survey shows nearly a billion 
people have a mental disorder [4]. People with mental disorders are frequently stigmatized in the community, 
which can have a negative impact on them [5]. 

Narcissism is a common mental disorder [6]. The characteristics of narcissistic personality disorder 
(NPD) narcissism are grandiosity, a need for praise, and a lack of empathy [7]. It is becoming more common 
in everyday life, particularly in social media. Those suffering from NPD frequently exhibit an inability to 
maintain relationships and their jobs. The NPD causes problems in many areas of life, including 
relationships, work, school, and finances. People with NPD may become dissatisfied and disillusioned when 
they do not receive special favors or adoration, they believe they are entitled to. They may not enjoy being 
around them because their relationships are not rewarding [8], which may cause them to act violently and 
aggressively [9]. 
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Early personality disorder diagnosis is critical for understanding people’s psychological conditions 
[10]. As a result, we can take steps to help people with personality disorders [11]. It is done based on 
appearance or behavior, which is usually a complex and difficult task. It requires the medical science of 
psychology [12]. It usually employs test screening. However, it would be costly and time-consuming for a 
large population. Furthermore, diagnostic procedures have the unintended consequence of discouraging 
healthy people from participating. Psychological problems consequently frequently go unrecognized or 
addressed. Machine learning (ML) is one method for predicting mental health disorders quickly and 
accurately. In recent years, ML techniques have been used in a variety of medical research projects, 
particularly in biomedical and neuroscience, to gain a better understanding of mental health issues [13]. 

ML enables computers and other computing systems to autonomously learn from the past and 
advance without explicit human programming [11]. ML is based on the creation of computer programs that 
can access data and figure things out for themselves. It is very effective in the healthcare industry, where 
there is a lot of data. In that case, the prediction model produced will be superior, free of human error, and 
will reduce the time required for diagnostics. ML is a significant advancement in computer science and data 
processing techniques that can be used to improve almost any service [14]. This allows for the formation of a 
pattern in the data, allowing for easier and more accurate identification and prediction. It is considered to be a 
highly useful tool for predicting mental health [15]. ML requires input in the form of features or variables. 
One of the most important issues is determining which feature will lead us to the best solution. Datasets used 
in the ML process typically contain redundant and irrelevant features, cannot improve accuracy [16], have no 
effect on the learning model [17], and may even degrade the learning model’s performance [16]. As a result, 
relevant features must be chosen. 

Numerous studies on feature selection (FS) techniques have been conducted recently to create a dataset 
with the most pertinent features for the best model performance [18] and reduce computation time [19]. FS is 
typically used to select only efficient features based on the given input by reducing noisy data, which aids in 
the identification of the application [20]. There have been many reports in previous research on FS methods 
that affect improving ML performance, including relief [21], minimum-redundancy-maximum-relevance FS 
(m-RMR) [22], information gain [23]-[25], and gain ratio (GR) [23], [26], [27]. To our knowledge, no 
research has reported the best method for each ML approach. Each method employs a distinct technique or 
strategy for identifying the best and most relevant features. There is no research that compares ML 
performance or discusses the use of FS in NPD cases. 

In this study, we would like to compare the performance of various FS approaches (all features, 
information gain, and GR) against some ML methods (support vector machine (SVM), random forest 
classifier (RFC), and Naive Bayes). In this study, features are chosen when the threshold value is greater than 
0.05. It is frequently required and used in studies that require good accuracy at a lower level [28], [29]. The 
research findings will provide an overview of the ML approach’s ability to predict NPD using features 
generated by FS techniques. 

The rest of this paper is organized as follows: section 2 provides a brief overview of the research 
method. In section 3, we present the experiment results as well as the comparison classification and FS 
methods. Finally, section 4 describes the conclusion, which demonstrates the remarkable effectiveness of the 
approaches. 


2. METHOD 

This section outlines the procedures followed when doing research. The research stage begins with 
determining the problem, followed by data collection. Before the data is entered into the model, it is 
necessary to do pre-processing of the data. At this stage, the data is cleaned and adjusted to be processed for 
the next step (feature selections using information gain and GR). The next stage is that the selected features 
will be processed into the ML method (SVM, RFC, and Naive Bayes). It consists of several steps, as 
illustrated in Figure 1. 


2.1. Dataset 

We have selected 8376 data from the data collecting process, with an average age of 14 to 50, 44 
features, and 1 class label. The data used in this research are obtained from Open Psychometrics 
(https://www.kaggle.com/datasets). The dataset consists of 5330 data with the class label of “yes” and 3046 
with the class label of “no.” Utilizing feature selection and preprocessing approaches, the data have been 
cleaned and reduced to the most important attributes. Three steps make up the method: feature selection, 
preprocessing, and applying ML to the prediction process. The details of 44 features of this study are listed in 
Table 1. 


Bulletin of Electr Eng & Inf, Vol. 13, No. 2, April 2024: 1383-1391 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 O 1385 


Information 
Gain 


Feature Selection 


Processing 
time 


Comparative Performan r ance | 


Analysis 


Figure 1. The design of the proposed method 


Table 1. Features of dataset 


Feature Description 
f01 Gender 
f02 Age 

f03-f42 Statements 1—40 
£43 Elapse 
f44 Score 
£45 Class 


2.2. Feature selection 

The source dataset often consists of various features, some of which may or may not be important 
for categorization [30], [31]. Unimportant features that depend on other attributes reduce prediction accuracy. 
A feature selection strategy must be used to overcome this and decrease the feature’s dimension. There are 
several methods in feature selection, namely filter, wrapper, and hybrid methods. The method most often 
used for feature selection is the filter method [32]. The Information gain and GR methods are used in this 
work, which helps to find the feature subsets. The techniques are one of the popular filter models [29]. Each 
feature in the dataset was counted, selected, and defined using a value limit known as the threshold (cutoff). 
This research uses a threshold >0.05. 


2.2.1. Information gain 

Information gain is the change in class entropy from a previous state to a state when an attribute 
value occurs. It is applied here to demonstrate how features are pertinent. Decision tree induction is the 
foundation of this method. Information gain is used as a criterion for choosing attributes. The information 
gain method has a faster time in the feature selection process than other methods [33]. The features with the 
most information will be ranked highly in this method; otherwise and low. It is computed using the following 
steps: 

Step 1: step one involves computing entropy (H) before observing attribute A and class C. 

Step 2: once attribute A has been observed, calculate the entropy. After seeing features that are a part of 
subsets of the main data set, this calculates the entropy. The first two phases are extremely important 
since they supply the entropies needed in the next phase to acquire information gain. 

Step 3: information gain calculation. The difference between the Entropy before observing characteristic A 

and information gain of attribute A. 
The final feature set that will be utilized for classification is chosen after computing the gain from 
each feature and setting a threshold. 


2.2.2. Gain ratio 
After dividing the data, the entropy value of the probability distribution subset is calculated using 
the GR, which normalizes the information gain acquired [21], [34]. When selecting features, the GR 
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considers the dataset’s number and size [35]. The GR modifies the information gain, which lessens its bias. 
The GR chooses an attribute based on the number and size of branches. By accounting for the inherent 
information of a split, it corrects information gain. 


2.3. Classification 

A classification is a form of data mining technique that is currently popular. It functions similarly to 
other methods like decision trees and neural networks. These strategies use various methods to assess the 
available data to produce their prediction. 

After the feature selection stage and the features that affect the class label based on ranking are 
obtained, the next stage is classification. Furthermore, the accuracy, error rate, and time required in the 
classification process are compared using the method of SVM, RFC, and Naive Bayes. The classification 
model will be validated using k-fold cross-validation. The cross-validation method is commonly used for 
training sets [36]. Figure 2 shows the k-fold cross-validation. 
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Figure 2. The procedure of k-fold validation 


2.4. Performance measurement 

This study examines how the confusion matrix might be used to gauge accuracy and mistake rate. 
A confusion matrix of size n X n coupled to a classifier, where n is the total number of classes, displays the 
anticipated and actual categorization. The confusion matrix for n = 2 is shown in Table 2. 


Table 2. Confusion matrix 


Predicted positives Predicted negatives 
Actual positives instances | Number of true positives instances (TP) Number of false negatives instances (FN) 
Actual negatives instances _ Number of false positives (FP) Number of true negatives instances (TN) 


Calculating prediction accuracy and classification error is another method for evaluating and 
comparing classifiers. Both values can be obtained from the confusion matrix Table 2 and calculated using 
(1) and (2). 


Ka (TP+TN) 
Accuracy = (TP+TN+FP+FN) (1) 
Meanwhile, to calculate the error, the following equation is used. 
Missclassification (Error) Rate = FEHN (2) 
(TP+TN+FP+FN) 
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3. RESULTS AND DISCUSSION 

In the pre-processing stage of data mining, features are selected from the initial attributes through 
feature selection. In this research, we use information gain and GR feature selection techniques to determine 
the number of features based on weight values. The first step in the method for choosing features is 
calculating the weight of each feature. Next, features will be ranked based on the weight value. 

In this research, we use three scenarios for performance evaluation of the classification algorithm 
including accuracy, error rate, and computational time. The first scenario: use all features. Second scenario: 
feature selection is carried out using information gain techniques. Third scenario: feature selection is carried 
out using the GR technique. The main aim was to perform a comparative analysis of the use of all features 
and two different feature selection techniques. In addition, to identify the best feature selection techniques 
that recommend the most relevant features. The results of feature selection can be seen in Table 3. 


Table 3. Selected features 


Information gain GR 

Feature Ratio Feature Ratio 
£44 0.971 f44 1.000 
£42 0.170 £42 0.173 
£32 0.152 £32 0.160 
fll 0.150 £29 0.157 
f14 0.143 £38 0.155 
£35 0.141 £37 0.155 
£34 0.132 £34 0.151 
£29 0.132 fll 0.150 
f15 0.127 £06 0.144 
£37 0.127 f14 0.140 
f09 0.119 £35 0.138 
f10 0.119 f15 0.136 
£08 0.115 £09 0.134 
f41 0.115 f27 0.125 
f12 0.111 £08 0.120 
£33 0.111 f10 0.119 
£38 0.111 f22 0.118 
f07 0.107 f41 0.116 
£27 0.098 £25 0.115 
f22 0.098 f12 0.111 
f06 0.096 £33 0.110 
£36 0.095 f07 0.109 
£20 0.090 £40 0.105 
f31 0.090 f04 0.102 
f03 0.085 £36 0.101 
£28 0.085 £20 0.099 
f21 0.079 £30 0.092 
f18 0.078 f21 0.091 
f04 0.077 f31 0.089 
£25 0.071 £03 0.087 
£30 0.067 £28 0.085 
£40 0.066 f17 0.820 
£39 0.066 f18 0.077 
f17 0.062 f23 0.074 
f23 0.058 f05 0.072 
f13 0.057 £39 0.072 
f16 0.057 f16 0.068 
f13 0.065 


From the Table 3 shows that the information gain technique produces 37 selected features, while the 
GR technique produces 38 features. From the information gain technique, there are seven features with a 
weight value of less than 0.05, including f26, f05, f19, f24, f02, f01, and f43. Meanwhile, from the GR 
technique, there are six features that with a weight value of less than 0.05, including f26, f19, f24, f02, f01, 
and f43. These features will be removed and not used in the classification process. The information gain 
technique produces fewer features than the GR technique. This means that the information gain technique is 
able to reduce irrelevant features according to the threshold. 

Selected features are used as final features as there is no further feature removal, and these features 
are used for training as well as the desired test data to measure classifier efficiency. We develop the model 
using WEKA (version 3.9.2) in this research. The WEKA platform simplifies the construction of several data 
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analysis techniques and offers a JAVA programming language API [37]. It provides tools for categorizing, 
regressing, grouping, eliminating superfluous traits, creating association rules, and displaying the dataset. 

To assess classification performance, we apply k-fold cross-validation. The same data is used to 
separate the dataset into ‘k’ subsets. Ten folds of data will be used in this study, with each fold being around 
the same size. They therefore have ten data subsets. The cross-validation test will employ 9-fold for training 
and 1-fold for testing for each of the 10 data subsets. Three different classifiers, including the SVM, Naive 
Bayes, and RFC available in WEKA were used to know which classifier outperformed. The confusion matrix 
for each characteristic in the dataset is used to evaluate how well the models performed. The results of the 
accuracy performance comparison of three scenarios can be seen in Table 4. 


Table 4. A comparison of accuracy for NPD 
Methods All features (%) Information gain (%) GR (%) 


RFC 99.93 99.96 100 
SVM 99.07 97.68 97.79 
Naive Bayes 86.52 91.28 91.34 


Based on test findings, it is known that the three ML techniques can accurately predict NPD when 
the feature selection strategy is used. By applying feature selection techniques, the accuracy value of each 
ML method has increased. This shows that the presence of features affects the classification results. Feature 
selection selects several features that are able to provide the best results in classification [38]. From the 
results of the validation test, the GR technique has the highest accuracy value compared to the others. 

For all test scenarios, it has previously been demonstrated that the RFC approach is more accurate. 
Value the accuracy of the RFC method using all features, information gain, and GR with 99.93%, 99.96%, 
and 100% respectively. We noticed that the RFC with GR has the best of accuracy with 100%. Meanwhile, 
the Naive Bayes method has the lowest accuracy value compared to the other methods. Because the Naive 
Bayes method has constraints in class imbalances that affect the classification results. Due to its sensitivity to 
class distribution, naive Bayes predictions can occasionally be unsuccessful at predicting minority instances. 

Furthermore, a comparison of the error rate values of three scenarios is carried out. The error rate is 
calculated using the confusion matrix. The results of the error rate comparison can be seen in Figure 3. 
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Figure 3. A comparison of the error rate of the methods 


Based on Figure 3, the RFC methods have the smallest error rate. The error rate of the RFC using all 
features, information gain, and GR with 0.07, 0.04, and 0 respectively. In this experiment, we also evaluate 
the efficiency of the scenarios by measuring the consuming time to develop the model. Consuming time 
includes the time required to build a predictive model. The comparison of consuming time is shown in 
Figure 4. 
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Figure 4. A comparison of consuming time of the methods 


Figure 4 presents the consuming time for the ML methods before and after the feature selection 
using information gain and GR. The test results show that the Naive Bayes with information gain method has 
the fastest consuming time; it only takes 0.22 seconds. Because Naive Bayes determines the best suitable 
probability by calculating the likelihood of one class for each existing attribute group. Meanwhile, the SVM 
approach takes more time than the other two methods. This is because the SVM method’s job is more 
complicated due to using a kernel that seeks out a hyperplane, which makes the processing time longer [39]. 

In addition, the test results show that the number of features affects the consumption time. The 
information gain technique produces fewer features than the GR. So, the information gain technique has a 
faster consuming time when compared to using all the features and GR techniques. 


4. CONCLUSION 

The test results show that the GR feature selection technique has the highest accuracy value 
compared to using all features and information gain. In the application of machine learning, the RFC method 
with a GR has the highest accuracy value of 100%. In the time-consuming test, the naive Bayes method with 
information gain has the fastest time, which is 0.22 seconds. The test results also show that the number of 
features analyzed greatly affects processing/consuming time. Therefore, it is necessary to carry out further 
research to improve feature selection performance to produce relevant and important features. 
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